This report explores a dataset containing canidates and attributes for approximately 490,000 contributions.
## 'data.frame': 493422 obs. of 12 variables:
## $ name : Factor w/ 107161 levels "& DREW BURKE, MELANIE",..: 1 2 3 3 3 4 5 5 5 5 ...
## $ canidate : Factor w/ 3 levels "cand_nm","Sanders, Bernard",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ city : Factor w/ 2251 levels "29 PALMS","ACAMPO",..: 1067 807 681 681 681 956 225 225 225 225 ...
## $ state : Factor w/ 2 levels "CA","contbr_st": 1 1 1 1 1 1 1 1 1 1 ...
## $ zip : num 9.55e+08 9.25e+08 9.29e+08 9.29e+08 9.29e+08 ...
## $ employer : Factor w/ 29481 levels "","-","- NONE -",..: 12217 10319 13216 13216 13216 3465 10227 10227 10227 10227 ...
## $ occupation: Factor w/ 13683 levels "","-","----",..: 6653 6653 7000 7000 7000 8062 5233 5233 5233 5233 ...
## $ amount : num 50 244 15 50 50 ...
## $ date : POSIXlt, format: "2016-04-17" "2015-09-04" ...
## $ receipt : Factor w/ 37 levels "","* EARMARKED CONTRIBUTION: SEE BELOW REATTRIBUTION/REFUND PENDING",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ lon : num -123 -117 -118 -118 -118 ...
## $ lat : num 39.4 34 33.8 33.8 33.8 ...
## name canidate
## WEIL, MONIQUE : 257 cand_nm : 0
## MCLENNAN, MARLYN : 240 Sanders, Bernard:407164
## AUSLENDER, LEONARD: 239 Trump, Donald J.: 86258
## ISERI, MARTIN : 220
## SPEAR, JOSEPH : 210
## DAVIDSON, LISA : 203
## (Other) :492053
## city state zip
## LOS ANGELES : 33799 CA :493422 Min. : 0
## SAN FRANCISCO: 32011 contbr_st: 0 1st Qu.:900683036
## SAN DIEGO : 17770 Median :926777949
## OAKLAND : 13124 Mean :759234179
## SAN JOSE : 11642 3rd Qu.:946091302
## BERKELEY : 10707 Max. :961628693
## (Other) :374369
## employer occupation amount
## NONE : 80793 NOT EMPLOYED :105838 Min. :-10500.00
## NOT EMPLOYED : 43044 RETIRED : 50674 1st Qu.: 15.00
## RETIRED : 37507 INFORMATION REQUESTED: 16746 Median : 27.00
## SELF : 33131 TEACHER : 11655 Mean : 68.16
## SELF EMPLOYED: 29749 ENGINEER : 8584 3rd Qu.: 50.00
## (Other) :268832 (Other) :299849 Max. : 10000.00
## NA's : 366 NA's : 76
## date
## Min. :2015-05-04 00:00:00
## 1st Qu.:2016-02-27 00:00:00
## Median :2016-04-06 00:00:00
## Mean :2016-04-09 14:49:04
## 3rd Qu.:2016-05-31 00:00:00
## Max. :2016-12-31 00:00:00
##
## receipt
## :489797
## Refund : 3137
## * EARMARKED CONTRIBUTION: SEE BELOW REATTRIBUTION/REFUND PENDING: 431
## REATTRIBUTION/REFUND PENDING : 16
## REATTRIBUTION TO SPOUSE, TERRIE SCHULTZ. : 4
## REATTRIBUTION FROM SPOUSE, DEDE WHITESIDE. : 3
## (Other) : 34
## lon lat
## Min. :-124.286 Min. :23.15
## 1st Qu.:-122.236 1st Qu.:34.05
## Median :-120.572 Median :36.68
## Mean :-120.112 Mean :36.00
## 3rd Qu.:-118.235 3rd Qu.:37.80
## Max. : 2.239 Max. :48.88
## NA's :2727 NA's :2727
Transformed both long tail-esque data to better understand the distribution of amount by each canidate. The transformed amount appears realtively similary with both canidates. Although, Sanders has a bit more contributiosn that are less then the fifty-dollar range peaking above 10 thousand, while Trump appears to have a larger count of amount around the nine-hundred dollar range. Let’s investigate some other variables like cities, Occupation and name.
Most Major cities contributed to Bernie’s campaign, with pretty smiliary contributions in Bakersfield and Hunnigton Beach. A majority of occuptations that contributed to Bernie’s campaign are “Not Employed”, while more in Trump’s contributions are “RETIRED”. Finally, that contributed more then two hundred contributions seem to support the Bernied campaign rather than Trump’s. It looks like no large multiple contributions were made with the Trump campaign in California.
## not_employed_df$state: CA
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2300.00 15.00 27.00 50.16 50.00 5000.00
## --------------------------------------------------------
## not_employed_df$state: contbr_st
## NULL
If we consider the unemployment between two canidates it would seem that mister Sanders has the largest. Most poeple who are unemployed fall into the zero to hundred dollar amount while we see a converging pattern as the amount increases. Within the summary we see that the median is exactly twenty-seven dollars the same amount which Mr. Sanders proclaimed in his campaign speeches.
I wanted to do a step by step approach of the dataset of the names who contributed the most. All of these names contributed to the Bernie Sanders campaign as was visualized above. When we zoom in and look closer at the data and seperate the names in each column while taking a log transformation of the slightly skewed chart we can get a very clear visualization of how the contributors spread out there payments. I didn’t want to get into the refund payments just yet in the data set because it requires multiple variables to investigate further, but as you can see from the chart above most of the contributions to the Sanders Campaign were paid around a wide variety of ranges.
## 'data.frame': 493422 obs. of 12 variables:
## $ name : Factor w/ 107161 levels "& DREW BURKE, MELANIE",..: 1 2 3 3 3 4 5 5 5 5 ...
## $ canidate : Factor w/ 3 levels "cand_nm","Sanders, Bernard",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ city : Factor w/ 2251 levels "29 PALMS","ACAMPO",..: 1067 807 681 681 681 956 225 225 225 225 ...
## $ state : Factor w/ 2 levels "CA","contbr_st": 1 1 1 1 1 1 1 1 1 1 ...
## $ zip : num 9.55e+08 9.25e+08 9.29e+08 9.29e+08 9.29e+08 ...
## $ employer : Factor w/ 29481 levels "","-","- NONE -",..: 12217 10319 13216 13216 13216 3465 10227 10227 10227 10227 ...
## $ occupation: Factor w/ 13683 levels "","-","----",..: 6653 6653 7000 7000 7000 8062 5233 5233 5233 5233 ...
## $ amount : num 50 244 15 50 50 ...
## $ date : POSIXlt, format: "2016-04-17" "2015-09-04" ...
## $ receipt : Factor w/ 37 levels "","* EARMARKED CONTRIBUTION: SEE BELOW REATTRIBUTION/REFUND PENDING",..: 1 1 1 1 1 1 1 1 1 1 ...
## $ lon : num -123 -117 -118 -118 -118 ...
## $ lat : num 39.4 34 33.8 33.8 33.8 ...
## name canidate
## WEIL, MONIQUE : 257 cand_nm : 0
## MCLENNAN, MARLYN : 240 Sanders, Bernard:407164
## AUSLENDER, LEONARD: 239 Trump, Donald J.: 86258
## ISERI, MARTIN : 220
## SPEAR, JOSEPH : 210
## DAVIDSON, LISA : 203
## (Other) :492053
## city state zip
## LOS ANGELES : 33799 CA :493422 Min. : 0
## SAN FRANCISCO: 32011 contbr_st: 0 1st Qu.:900683036
## SAN DIEGO : 17770 Median :926777949
## OAKLAND : 13124 Mean :759234179
## SAN JOSE : 11642 3rd Qu.:946091302
## BERKELEY : 10707 Max. :961628693
## (Other) :374369
## employer occupation amount
## NONE : 80793 NOT EMPLOYED :105838 Min. :-10500.00
## NOT EMPLOYED : 43044 RETIRED : 50674 1st Qu.: 15.00
## RETIRED : 37507 INFORMATION REQUESTED: 16746 Median : 27.00
## SELF : 33131 TEACHER : 11655 Mean : 68.16
## SELF EMPLOYED: 29749 ENGINEER : 8584 3rd Qu.: 50.00
## (Other) :268832 (Other) :299849 Max. : 10000.00
## NA's : 366 NA's : 76
## date
## Min. :2015-05-04 00:00:00
## 1st Qu.:2016-02-27 00:00:00
## Median :2016-04-06 00:00:00
## Mean :2016-04-09 14:49:04
## 3rd Qu.:2016-05-31 00:00:00
## Max. :2016-12-31 00:00:00
##
## receipt
## :489797
## Refund : 3137
## * EARMARKED CONTRIBUTION: SEE BELOW REATTRIBUTION/REFUND PENDING: 431
## REATTRIBUTION/REFUND PENDING : 16
## REATTRIBUTION TO SPOUSE, TERRIE SCHULTZ. : 4
## REATTRIBUTION FROM SPOUSE, DEDE WHITESIDE. : 3
## (Other) : 34
## lon lat
## Min. :-124.286 Min. :23.15
## 1st Qu.:-122.236 1st Qu.:34.05
## Median :-120.572 Median :36.68
## Mean :-120.112 Mean :36.00
## 3rd Qu.:-118.235 3rd Qu.:37.80
## Max. : 2.239 Max. :48.88
## NA's :2727 NA's :2727
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1153.00 2.70 3.00 8.03 10.00 742.40
In this second graph concerning occupation you can see that people had standard employments such as Teacher, Not Employed, Retired, Clerk and Photographer. What striked me odd in this barplot was some people changed there occupation with donations for example MCLENNAN, MARLYN was listed as RETIRED and NOT EMPLOYED.
This report explores a dataset containing canidates and attributes for approximately 490,000 contributions with 12 features (name,canidate,city,state,zip,employer,occupation,amount,date,receipt,lon,lat). The variables name, canidate, city state, employer, occupation are factors.
Other observations: * cities with most contributors are leaning liberal * largest contributor occupation is not employed * largest count of contributions are Bernie supporters * largest refund is more then alloted 10k * biggest contributions is the upperbound of 10k
the main features of the dataset are amount, city and occupation. I’d like to determine which features are voting for each respected canidates the most and what is their median or max contributions for each canidate as well as top locations.
They only feature I truly changed within the data set is the datetime which I re translated it to a more formal method so I may view it during my bivariate plot section. Other then that most variables were just re written to express a more readable form.
any operations on the data to tidy, adjust, or change the form of the data? If so, why did you do this?
I log-transformed the right skewed of the amount and count of the contributions in the canidate. While the price still has a similar skewedness I wanted to to investigate a deeper understand of the dataset for which it gave me a clear picture and understand of how the data set functioned. Primarly, we can see within the amount that the Sander’s group tends to contribute more in the below hundred dollar range while Trump’s group tends to follow a higher contribution pattern. Most of the distrobutions had okay patterns. They only real unusual visualization I saw was employment being change by person in the largest contributions by occupation graph.
## date amount
## Min. :2015-05-04 00:00:00 Min. :-10500.00
## 1st Qu.:2016-02-27 00:00:00 1st Qu.: 15.00
## Median :2016-04-06 00:00:00 Median : 27.00
## Mean :2016-04-09 14:49:04 Mean : 68.16
## 3rd Qu.:2016-05-31 00:00:00 3rd Qu.: 50.00
## Max. :2016-12-31 00:00:00 Max. : 10000.00
## canidate
## Sanders, Bernard:407164
## Trump, Donald J.: 86258
##
##
##
##
I created a subset of the data with amount, canidate and date. The problem I have in this datset is you don’t have real numerical analysis but primarly location and amount contributions. But, you still can see interesting results in this matrix plot for instance, the canidates by date you see some noramility in one of the canidates. Secondly, if you look at the date time between Jan and July you see a large growth in contributions, let’s investigate this further.
As the date moves into the election you see that donald Trump contributions are growning and Bernies are decreasing. This is due to the fact that Donald Trump went on to become the Repulican party canidate and Bernie Sanders had to drop out of the race. Let’s look at this in a more normal distrubtion.
## merge_samp$canidate: Sanders, Bernard
## Min. 1st Qu. Median
## "2015-05-27 00:00:00" "2016-02-13 00:00:00" "2016-03-25 00:00:00"
## Mean 3rd Qu. Max.
## "2016-03-14 15:50:05" "2016-04-30 00:00:00" "2016-08-04 00:00:00"
## --------------------------------------------------------
## merge_samp$canidate: Trump, Donald J.
## Min. 1st Qu. Median
## "2015-06-18 00:00:00" "2016-07-06 00:00:00" "2016-08-02 00:00:00"
## Mean 3rd Qu. Max.
## "2016-08-07 09:56:08" "2016-09-12 00:00:00" "2016-12-23 00:00:00"
Clearly we see that Sanders’ campaign is increasing as the months go on peacking around the third or forth month of 2016. We don’t see large contributions in Donald Trump until the June mark. This could be due to teh fact that he was nominated aroudn this time or it’s possible that California was still not interested in Donald Trump being the Republican canidate and still was hoping for a more moderate conservative like Jeb Bush or John Kasich. These plots do paint a very well detailed portrait of the canidates time running for president.
As you may view from above the trend of the line graph by date between Sanders and Trump. Some observations to view are Sander’s data set is not a consistent as Trump’s. Meaning, his data set tends to peak and fall between high donations and refunds, while Trump’s data set seems to look more uniform in nature. Could this have something to do with anixety in the election cylce? One observation to be noted is the negative amount in the y axis. Let’s dive into this a bit closer.
as you can see above most of the refund that were return were by Bernie Sanders campaigne. While some refunds were made by Donald Trump it seems that Bernie Sanders had the most. However, one thing to keep in mind is that Bernie’s dataset is larger so it could account for just more datapoints in the plot. if we look at amounts we can see that Bernie sanders has more contributions in the postive direction as well. Let’s look at these in more depth statistically.
## bernie_city$city: BERKELEY
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -300.00 -169.50 -100.00 -119.80 -75.00 -5.27
## --------------------------------------------------------
## bernie_city$city: HUNTINGTON BEACH
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -100.00 -75.00 -50.00 -58.33 -37.50 -25.00
## --------------------------------------------------------
## bernie_city$city: LOS ANGELES
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2280.0 -362.4 -214.0 -329.1 -71.0 -10.0
## --------------------------------------------------------
## bernie_city$city: OAKLAND
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -1000.00 -294.60 -39.59 -244.00 -35.00 -5.00
## --------------------------------------------------------
## bernie_city$city: SACRAMENTO
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -500.00 -410.00 -137.50 -222.90 -71.25 -12.50
## --------------------------------------------------------
## bernie_city$city: SAN DIEGO
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -500.00 -200.00 -100.00 -158.70 -38.75 -17.50
## --------------------------------------------------------
## bernie_city$city: SAN FRANCISCO
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2700.0 -757.8 -500.0 -653.2 -142.2 -5.0
## --------------------------------------------------------
## bernie_city$city: SAN JOSE
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -500.00 -236.20 -123.00 -171.80 -33.36 -20.23
## trump_city$city: LOS ANGELES
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -3027 -3027 -3027 -3027 -3027 -3027
## --------------------------------------------------------
## trump_city$city: SAN FRANCISCO
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2000 -2000 -2000 -2000 -2000 -2000
## largest_amount_bernie_trump$city: BAKERSFIELD
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 20.0 37.5 130.0 100.0 2700.0
## --------------------------------------------------------
## largest_amount_bernie_trump$city: BERKELEY
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 15.00 27.00 60.17 50.00 2600.00
## --------------------------------------------------------
## largest_amount_bernie_trump$city: HUNTINGTON BEACH
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 15.0 35.0 89.9 100.0 2400.0
## --------------------------------------------------------
## largest_amount_bernie_trump$city: LOS ANGELES
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.80 15.00 27.00 67.72 50.00 5400.00
## --------------------------------------------------------
## largest_amount_bernie_trump$city: OAKLAND
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 15.00 27.00 61.11 50.00 2700.00
## --------------------------------------------------------
## largest_amount_bernie_trump$city: SACRAMENTO
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 15.00 27.00 58.78 50.00 2000.00
## --------------------------------------------------------
## largest_amount_bernie_trump$city: SAN DIEGO
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 15.00 27.00 61.76 50.00 2700.00
## --------------------------------------------------------
## largest_amount_bernie_trump$city: SAN FRANCISCO
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 15.00 27.00 64.55 50.00 2700.00
## --------------------------------------------------------
## largest_amount_bernie_trump$city: SAN JOSE
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 15.00 27.00 55.96 50.00 2000.00
##
## Pearson's product-moment correlation
##
## data: merge_bs_dt_df$zip and merge_bs_dt_df$amount
## t = -150.4, df = 493420, p-value < 2.2e-16
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.2120382 -0.2067023
## sample estimates:
## cor
## -0.2093718
We can now see that most of the the regunds fall into a median point for Sanders less then negative five hundred dollars. While, the Trump team seems to have small strips. If we view the postive amount, meaning the contributions, we see that by city the median rate is in twenty-seven mark for Bernie Sanders (which was a campaign footnote) while Trump’s falls in the forty-nine to fifty range.
How did the features of interest vary with other features in the dataset.
the only true correlation I could do on this dataset was amount and zip_code using pearson and it found no real true correlation (something closer to zero) or around twenty percent negative correlation. Which means it the price of contributions is pretty all across the board in each zip code. I must stress again that the datasets that were both gathered on the government website are skewed in size, Bernie’s being higher, but taking that into account there is a strong notion that Bernie’s footnote campaign talk about the average american spending roughly twenty-seven dollars in this contributions seems to be right on the money. However, Sanders did not explain the large amount of refends he was getting from contributors. Trump’s, while low, seems to follow a normal uniform pattern in campaign contributions which just shows he’s steadly reaching the finish line.
There isn’t really any true relationship across the board in this dataset it really just outlines that the Sanders campaign had many contributors who seemed to be up and down along the campaign while Trump’s supports were steady and consistent.
the strongerst thing I found was Bernie Sanders footnote of twenty-seven dollars. I though that was pretty wonderful.
These last two plots show the relationship that the canidates have by major cities of the date of time and the amount that is spent. As you can see clearly Sanders contributions in these states are more sporatic as the months go on while Trump’s tend to fall in a more uniform pattern (this is not to say that this is either good or bad). While, in the box plot we see are clearer picture of amounts recived where Sanders falls into the median range of around $27 dollars while trumps median range per city now seems to have more sporatic scaling. I personally believe the box plot shows a clear image of what’s going on in these cities.
These last two graphs show the overall range of were people are contribution. As you have noticed in the map you see a lot of red within the San Diego region and more across the ocean. However, these regions to have some dots for Bernie. Purple represents counties and cities where you have equal amounts of of contributions. However, this is more indepth with a sample size of 50 contributors with the heat map. Within the heat map you see most of the areas are converging to purple with some leaning blue and other leaning more red. This gives us a good representation how the overall distrubtion of how voters were contributing in California.
Essentially, what the multivariable analysis taugh me is there really isn’t a left leaning or right leaning contribution within these two data sets. Primarily, most of the contributions seem to be equal in strenght. While, Bernie’s dataset is larger then Trump’s the conserable spread across the state between Democractic or Republican seems to be pretty equal here and has no direct correlation between on another.
I think the most suprising thing is that it’s NOT more blue in the data sets and it’s roughly spread fairly equally. I would of thought more people to support Bernie Sanders way more then Donald Trump and while they did support Sanders more I thought it would be more extreme.
The distribtuion of amount by count is on a log scale and is still skewed. However, this does give us a first impression of equal distribtuion between Sanders and Trump. Sanders, falling higher in scale below the <100 mark and Trump falling higher in the =800 mark.
This plot indicates how major cities contributed to the campaign. As you can see sanders has a median of roughly 27 dollars while Trump hwas a median around 50 dollars depending on the city. This shows a more uniform steady contribution for the sanders campaign while Trump’s is more all over the place. However, it looks to me like larger contributions were made for trump within the third quartile, while Sanders has heavier outliers.
This heat map shows a sample of 50 contributors for Bernie and Trujmp which shows a realtive equal pattern between contributions. While some are more left leaning and dmore right a majority fall into the purple category.
This report explores a dataset containing canidates and attributes for approximately 490,000 contributions in California during the 2016 election for President of the United States. Most of the explination within the data set was done with amount, date, canidate, occupation, name and location. I wanted to explore if certain regions were more impacted by Sander’s campaign or Trump’s campaign. And my conclusion would be that, yes, there were more contributions made with the Sanders campaign, However, I was surprised to see that there were still contributions made for the Trump campaign as well.
When I explored the data set the first thing that suprised me was the amoutn of refunds that were carried about by the Sander’s campaign, one of the largest be -10,500 dollars. While I know there are alot of rules pertaining to campaign contributions I wonder if it was a true rerfund or something that wasn’t done properly based on California contribtuion rules.
Sander’s campaign median level of donation fell around the 27 dollar mark, while Donald Trump’s campaign fell around the 50 dollar mark. I was pleased to see that Sanders was correct in using his footnote for his speeches. The thing that struck me the most about these contributions is that a large majority were done by peopel who were unemployed, which makes me quesetion, do people get more involved in elecion cycles when they are unemployed? Do poeple tend to not care about contributions when they are in fact employed? Does the upper middle class seem to not care about elections because they are comfortable with their stage in life? It’s an interesting analysis to appraoch.
Finally, I believe doing an analysis of a more red state and comparing Sander’s to Trump would be interesting, or a swing state. However, I choose this state because it is in fact where I live.